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SYSTEM AND METHOD FOR COMPATIBLE 2D/3D (FULL SPHERE WITH HEIGHT) 

SURROUND SOUND REPRODUCTION 

BACKGROUND OF THE INVENTION 

This application claims the priority of provisional appHcation 60/455,497 filed 18 March 
2003 and is hereby incorporated herein by reference. The inventor's paper entitled "Scalable 
Tri-play Recording for Stereo, ITU 5.1/6.1 2D, and Periphonic 3D (with Height) Compatible 
Surround Sound Reproduction" presented at the 115^*" convention of the Audio Engineering 
Society in October of 2003 is hereby incorporated herein by reference in its entirety. 

Lifelike reproduction of sound has long been a subject of scientific exploration and 
experimentation. While we may not have completed this exploration, we now know enough to 
record and reproduce a very good approximation of the lifelike sounds of, for example, musical 
performance in an acoustic space, and other applications. We do know that it is essential to 
preserve true three-dimensionality of the arrivals at the ear of both direct and reflected sounds, or 
close approximations of their directions of arrival. We say "true three-dimensionality" ("3D") 
because the term is much misused. For example, methods are often termed 3D where 
reproducers (e.g., loudspeakers) are arranged only in the horizontal plane. These methods can 
only reliably preserve horizontal angles of sound arrivals where the listener is at the center of a 
horizontal circle. However, in live listening in an acoustic space, reflections also arrive from 
above and below, at vertical angles of elevation, referred to as "height", and resulting in truly 
natural "periphonic" hearing. 
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For lifelike reproduction, there are both (a) important reasons why the most reliable way 
to reproduce height is by locating loudspeakers above and below the listener, who is now at the 
center of a sphere, not just a circle, and (b) important reasons why height must also be preserved 
in the first place. 

Regarding point (a) above, in the past, less reliable methods have attempted to generalize 
an important aspect of human Head-Related Transfer Functions ("HRTF") using generalized 
filters or so-called "dummy-head" microphones, intended to deliver to inside the two ear canals 
of the listener what was recorded at the two ear canals of the dummy head. The problem is that 
the human mechanism for determining sound arrivals fi-om above or below is the pinna, or outer 
ear. Folds of the pinna cause reflections of higher fi-equency sounds either partially to reinforce 
or partially to cancel, or attenuate, depending on both the fi-equency and the direction of the 
sound, both horizontal and vertical. But each human individual's pinna are as unique as a 
fingerprint, so generalized filters or generalized "dummy pinna" work more or less poorly for 
each listener. Miniature microphones placed within the ear canals of the recordist/listener result 
in more lifehke reproduction, but only with that one person doing the recording and/or Ustening. 

For lifelike reproduction by a group of listeners - such as in listening to recorded music 
in a home theater, training in a simulator, or virtual reality for computer multi-media, or riding 
an amusement ride - loudspeakers must be located above and below as well as around the 
listeners. Each listener's pinna, in "agreement" with other aspects of their individual HRTF, will 
determine for them both the azimuth and elevation of each sound, just as they have leamed these 
complex relationships for themselves since childhood. 
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Regarding point (b) above, why must true 3D (i.e., with height) be preserved in the first 
place? The reason is that humans leam sound directionality by relating seeing sources of sound 
with the hearing mechanisms described above. Through a complex ear-brain response the 
listener knows the direction of a soimd - above or below as well as horizontally ~ even when 
facing another way or with eyes closed. In acoustic spaces, unseen reflections arrive at different 
times, building up to steady state, then collapse in the same order when the source of the sound 
stops. Each arrival and "departure" from each direction is tonally "colored" by the pinna. 
Musicians hear this same complex interplay and form each note, phrase, even pause, to be 
^musically correct", playing the acoustic as an extension of their instrument. The "tonality" or 
timbre of their guitar, piano, or violin would sound very different in a different space. They will 
play differently in a different hall to be musically correct in that hall, such as playing faster or 
more legato in a small space and slower and more pizzicato in a large one. Listeners in the same 
space leam this "musical language" and appreciate the music more when they agree it is correct. 
But take away height reflections from the ceiling or acoustic clouds above the stage and the 
timbre changes dramatically. 

So for lifehke reproduction of natural sounds such as music, spherically positioned 
reproducers of sound are a requirement. 

Numerous approaches termed "three-dimensional" are in fact only two-dimensional since 
they use speakers only in the horizontal plane. If the listener perceives any height sounds, they 
can only be due to the acoustics of the listening environment, which are invalid in reproducing 
the space where the music was recorded. Other approaches attempt to simulate height auditory 
"cues", or signals, to the ear-brain system, however these cannot be generalized reliably to life- 
like degree for all listeners because their pinna are as individual as their fingerprints, as described 
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above. If the goal is to believably reproduce the recorded space, then the Hstener will beUeve he 
has been "transported" to that space and is no longer in the listening space. If the recorded space 
is an acoustic one with reflective ceiling and floor elements, lifelike believability requires 
vertically-arriving sounds to be preserved. Since we cannot successfully generalize pinna 
colorations (e.g., by using filters and/or dummy heads) that connote height, we can best 
reproduce height cues by using loudspeakers above and below the listeners. But an infinite 
number of loudspeakers and channels as in real life would be infinitely impractical. 

Prior art systems, such as 1^^ Order Ambisonics, creates a reasonable approximation of 
three-dimensionality using four channels and a minimum of eight loudspeakers, Ambisonics has 
not succeeded in the marketplace for a variety of reasons, not the least of which is the fact that 
Ambisonics does not produce a lifelike reproduction of soimd in fi-ont of the listener, where the 
ear-brain "perceptualization" is most acute. 

Another prior art system, called Ambiophonics, uses a two-channel binaural-based 
approach that precisely positions sounds across a 120 degree arc in fi-ont of a listener where such 
localization is most important for lifelike hearing. In order to localize firontal sounds widely yet 
accurately, Ambiophonics uses two closely-spaced speakers, called a "stereo dipole" or 
"Ambiopole", and transaural crosstalk cancellation. However, Ambiosonics is inherently two- 
dimensional and incapable of producing three-dimensional sound with height. 

Prior art monaural systems sounded correct tonally but had a "stage door" affect: it was 
localized at a point in 2D for coming through a narrow opening, say, in an orchestra shell wall. 
Prior art stereo systems, while providing spaciousness in sound in two dimensions, suffer fi-om 
lack of localization as the speakers are typically placed as the fi-ont left and firont right positions, 
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thereby leaving a large gap between the speakers. Other prior art systems, such as ITU 5.1/6.1 
and stereo, favor spaciousness and simulating tonality at the price of accxu-ate localization - as 
though mutually exclusive. ITU 5.1/6,1 systems extend the stereo concept to envelop listeners 
but only in two dimensions. A height component is lacking. 

Another prior art system is WaveField Synthesis ("WFS"). The WFS system is limited to 
two dimensions and therefore lacks the directionality of height and the natural timbral quality 
achievable by systems and methods exercising the present invention. Furthermore, WFS 
requires upwards of 36 speakers and is impractical at present in needing as many channels for 
distribution and digital signal processing as for reproduction. 

Yet other prior art systems, known collectively as Higher Order Ambiosonics ("HO A") 
likewise have deficiencies. Along with the deficiencies previously noted for Ambiosonic 
systems, HOA systems require nine or more channels for Ambisonic components for a total of 
1 1 or more distribution channels. Currently, six fiiU-range channels is the current limitation of 
distribution media such as DVD- A, SACD, and DTS-CD. 

No prior art systems have yet been able to reproduce accurate 3D sound - with height and 
accurate spaciousness, tonality, and localization. The present invention produces life-like 3D 
sound with correct spatial impression, timbre (tonality), and localization. Furthermore, 
embodiments of the present invention plays compatibly in stereo, ITU 5.1/6.1, full 3D using 
available 6-channel media, and fiill 3D using 10 or more speakers in a home theater or height- 
modified cinema. 

It is an object of the present disclosure to provide a novel system and method for 
accurately reproducing a 3D soimd field. 
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It is another object of the present disclosure to provide a novel system and method for 
combining accurate reproduction of "front stage sound" with accurate three-dimensional 
localization of sound to produce a sound field with height and accurate spaciousness, tonality, 
and localization. 

It is yet another object of the present disclosure to provide a novel system and method for 
producing a signal which accurately reproduces a 3D sound field that is also capable of play back 
on current surround 2D soimd systems without the use of a decoder or the need to add additional 
speakers. 

It is still another object of the present disclosure to provide a novel system and method 
for providing a transformation matrix for mapping a 3D sound field into a signal for providing a 
2D sound field without the need for a decoder. 

It is still yet another object of the present disclosure to provide a novel system and 
method for providing a reconstitution matrix for accurately reproducing a 3D sound field. 

It is a fixrther object of the present disclosure to provide a novel system and method for a 
microphone array capable of capturing a soxmd field in three dimensions. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 A is a high level block diagram illustrating the flow of information from a 
microphone array through an encoder, a decoder, to a set of 3D speakers according to 
embodiments of the present disclosure. 
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Figure IB is a high level block diagram illustrating the flow of information from a 
microphone array through an encoder to a set of 2D speakers according to embodiments of the 
present disclosure. 

Figures 2 A - 2C are a depiction of the top, front, and side views of an embodiment of a 
hybrid microphone array according to an aspect of the present disclosure. 

Figures 3 A - 3F each depict one of six transform modes according to aspects of the 
present disclosure. 

Figures 4A - 4F each depict one of the six 3D transform mode matrices of Figures 3 A - 
3F, respectively. 

Figures 5 A - 5F each depict one of the six reconstitution matrices of Figures 4A - 4F, 
respectively. 

Figure 6 is an illustration of a speaker layout for an embodiment of the present disclosure. 

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 

An embodiment of the present disclosure may comprise (a) a microphone array capable 
of capturing sounds in three dimensions and using, perhaps, six recording channels; (b) an 
encoder for "transformation" of recordings from the microphone array so that the captured 
sounds may be encoded on standard media such as compact discs ("CDs") or digital video discs 
("DVDs") such that playing the media requires no decoder for replay on, for example, ITU 
5.1/6.1 systems; (c) a decoder for lossless "reconstituting" of 3D information of the captured 
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sounds for use with a 3D speaker layout; and (d) a speaker layout for 3D reproduction of the 
captured sounds, or a standard ITU 5.1/6.1 speaker layout. It shall be understood by those of 
skill in the art that the an ITU 5.1/6. 1 system does not require a 3D speaker layout. The novel 
system and method are sometimes referred to herein as "PerAmbio 3D/2D" or simply 
"PerAmbio". 

For example. Figure 1 A is an overall, high-level block diagram of an embodiment of the 
present disclosure illustrating the flow of information from a microphone array 10 through an 
encoder 12, a decoder 14, to a 3D speaker arrangement 16. Sound field 2 impinges on the 
microphone array 10 which produces a microphone signal ("Pin")- The microphone signal may 
be a six channel signal. The encoder 12 converts Pin to an encoded signal ("Sout")- The encoded 
signal is sent to the decoder 14 which produces a decoded signal ("Pout")- Pout is applied to the 
3D speaker arrangement 16 to produce a 3D sound field that is an accurate reproduction of the 
sound field 2. 

Figure IB is an overall, high-level block diagram of an embodiment of the present 
disclosure illustrating the flow of information from a microphone array 10 through an encoder 12 
to a 2D speaker arrangement 18. Soimd field 2 impinges on the microphone array 10 which 
produces a microphone signal ("Pin"). The microphone signal may be a six channel signal. The 
encoder 12 converts Pin to an encoded signal ("Sout")- The encoded signal is applied to the 2D 
speaker arrangement 18 to produce a 2D soimd field that is a representation of the soimd field 2. 

The details of the components of the systems in Figures 1 A and IB will be discussed 

below. 
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Microphone Array 

Embodiments of the present invention may include a specialized microphone array for 
recording the necessary information of the soimd field 2 so as to accurately reproduce the sound 
field with a speaker arrangement. 

Figures 2A - 2C depict a novel microphone array according to embodiments of the 
present disclosure. The microphone array, sometimes referred to as the "PerAmbio 3D/2D 
microphone array" is a hybrid array comprising a "soundfield" array for four Ambisonic signals 
(W, X, Y, Z), also know as B-Format chaimels, and a baffled, substantially ellipsoidal array for 
Ambiophonic signals (FL, FR, BL, BR). 

1^^ order so-called "B-format" Ambisonic signals, called W, X, Y, and Z, represent 
pressure (omni-directional), and forward-, leftward-, and upward-facing pressure-gradient 
(velocity) microphone elements, respectively, as is known in prior art. The B-format signals in 
combination can approximately represent the sound of plane waves arriving at a listener fi-om 
any direction in 3-dimensions. They contribute the "ambience" component of PerAmbio 3D/2D. 

An ellipsoid 20 is approximately head-shaped and contributes that portion of human 
HRTF (head related transfer function) that can be successfully generalized - the human head 
spacing and "shadowing" between the ears. Head-spacing causes time delay, or interaural time 
delay ("ITD") while the head-shadowing describes the loss of level at firequencies greater than 
approximately 700 Hz, known as interaural level difference ("ILD"), of sounds originating from 
the side of the head opposite each ear. The inventive microphone array is designed with its 
imprimatur for these aspects of HRTF because they are similar in nearly all individuals. They 
contribute a great deal to horizontal localization of sounds - but not all. As discussed above, 
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learned through experience, a listener's individual pinna cues must agree with head size and 
shadowing cues, or the listener is confused, and deems the sound not lifeUke. The pinna are 
highly individual unlike prior art microphone arrays which use a dummy head with a "standard" 
pinna configuration. Since the inventive microphone array is pinnaless, the only "pinna" in the 
system are the listener's. 

The microphone baffling 22 attenuates sounds fi-om behind and above in order to avoid 
interference with the soundfield array that might otherwise cause undesirable ambiguous images 
and comb filtering for critical firontal sounds. Figures 2A - 2C show a horizontal and vertical 
firontal acceptance angle. In one preferred embodiment, the horizontal firontal acceptance angle 
is 120 and the vertical frontal acceptance angel is 150 . Side and top baffles use the boundary- 
layer effect with small microphone diaphragms located at the intersection of these planes and the 
"plane" tangent to the ellipsoid. This avoids high frequency reflections that otherwise would 
cause undesirable comb filtering and smearing of the microphone's impulse response, which is 
critically important in this application. The baffles provide 6 dB of acoustic gain above 500 Hz, 
which, when compensated with equalization, result in a +6 dB increase in signal-to-noise above 
that frequency, and make possible the use of small diaphragm microphone elements. The 
microphone may weigh approximately 7 kg (15 lb) and can be moxmted on a stand or suspended 
and tilted as needed. 

Microphone positions are designated on Figures 2A - 2C as FL (front left) 24, FR (front 
right) 25, BL (back left) 26, and BR (back right) 27. The vectors associated with FL, FR, BL, 
and BR indicate the general direction of sound which impinges on each of the microphones. In 
embodiments of the microphone array which use 6 channels, either the FL, FR microphone pair 
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or a mix adding the FL, FR pair to the BL, BR microphone pair, is used. When all four 
microphones are in use, an additional pair of channels is needed. 

For compatibility with ITU-R BS.775.1 two dimensional surround systems, the 
microphone array may be fitted with the BL, BR microphone pair on the back of the baffle and 
may be positioned in coincidence (approximately 25 mm or less in 3-dimensional space) fi-om 
the frontal pair (FL, FR). For anechoic recordings such as out of doors, the baffle may be 
typically flat and the horizontal and vertical acceptance angles are therefore 180 in firont or back. 
Recordings made with the FL, FR, BL, BR microphones are compatible with standard ITU 
5.1/6.1 systems. Playback in home theaters with ITU 5.1/6.1 systems, as discussed previously, 
results in two dimensional surround sound accurate over 360 when played using two cross-talk 
cancelled stereo dipoles (fi-ont and back). Playback can be three dimensional, with an 
appropriate speaker arrangement, if the B-format microphone signals are captured as well. 
PerAmbio three dimensional B-format signals may also be generated post-production using hall 
impulse responses and convolution of the front Ambiophone channels. The PerAmbio outputs of 
the present invention may be augmented with "spot" microphones highlighting individual 
instruments as desired by the recording or mixing engineer using methods specific to the present 
invention. 

2D/3D Playback System 

The present disclosure describes an encoder for "transformation" processing of 3D 
recordings in a form compatible with standard ITU 5.1/6.1 systems such that no decoder is 
needed. In doing so, the mastering engineer may select a useful "mode" that mathematically 
maps the height information in a way that most suits the performance or venue, e.g., opera. 
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recital, arena concert, movie scene, etc. Eighty combinations of transformation modes are 
possible, but only a dozen or so are useful to the experienced recording engineer. The 
transformation mode selected by the recording engineer is reversible and changeable by the 
mastering engineer during preparations for mass distribution on CD or DVD media, for example. 
Transformation makes possible not just uncompromised, but potentially improved, 5.1/6.1, CD, 
DVD, etc. two dimensional media that contains embedded information for lossless 3D 
"reconstitution", described below, for example, when a listener adds a 3D decoder and 3D 
speaker arrangement. 

When the user elects to expand to three dimensional sound from a prior art two 
dimensional system, he adds a "reconstitution" decoder 14 of the present invention, or a 
receiver/audio controller so-equipped. The reconstitution decoder 14 both: (a) recovers the three 
dimensional information according to the mode selected by the recording engineer; and (b) 
develops outputs for feeding, for example, 10, 14, or 26 loudspeakers, including four or more 
above and below the horizontal plane, depending on the user's resources. In DVD-A, the 
transformation mode selected by the recording engineer could be encoded in meta-data such that 
the user's receiver/decoder 14 could automatically select the mode for reconstitution. In 
addition, the transformation "mode" selected by the recording engineer or mastering engineer, is 
reversible and changeable by the advanced user as desired in order to enhance reproduction in 
two dimensional ITU 5.1/6.1 systems. The reconstitution decoder 14 of the present invention 
has been realized in DSP (digital signal processing) prototype form, has been demonstrated, and 
is ready as software for a programmable DSP chip ready for manufacture of consumer receivers 
and professional decoders. 
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In addition to adding a reconstitution decoder 14, in order to get true 3D reproduction, the 
user must add, for example, four or five or more speakers (and power amplifiers) for a total of 
10,14, or 26 depending on the user's resources. Ten speakers is the experimentally determined 
minimum for lifelike results. Referring now to Figure 6, which is a depiction of a twelve speaker 
arrangement according to an embodiment of the present disclosure, the two fi-ontal speakers (41, 
42) typically are of higher quality and power than the eight ambience speakers (43, 44, 45, 46, 
47, 48, 49, 50) and two back speakers ( 5 1 , 52) which may be of "satellite-quality" and lower in 
power. Speaker locations are somewhat flexible with decreasing quality of results if varied fi^om 
recommended positions of the present invention. Whether in the recommended positions or not, 
the reconstitution decoder 14 of the present invention may be programmed by the user to reflect 
the exact loudspeaker locations during setup. The "Listening Area" ("Sweet Spot") is enlarged 
due to the hybrid nature of the present invention to accommodate 6 persons or more in a space of 
size commonly used for home theaters. 

Encoder 

Figures 3 A - 3F depict six possible transform modes the inventor has identified as usefiil. 
If metadata permitted, the recording engineer could have available all 80 combinations (3"^-!) 
considered for encoding 3D directionality into 6 fiiU-range ITU compatible media channels for 
direct replay in 5.1/6.1. For 3D replay, decoding corresponding to the recording mode is 
implemented preferably in a DSP chip, but other implementations are contemplated. It may also 
be possible for users to download new matrices via the Internet. 

The inventor has identified six usefiil "modes" for use in situations such as music 
recording, cinema ambience, multi-channel broadcast, etc. A mode chosen during recording may 
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be changed in post-production, or by a user with a "smart decoder" reconstituting original 
channels and making a new transformation. Changing the tilt of a raised (suspended) 
microphone is also easily done. For example, in DVD-A mastering, a flag is set in meta data of 
the tri-play 3D/2D disc for automatic selection by replay equipment. 

For ease of use, mnemonics describe the three basic modes, i (Figure 3 A), j (Figure 3B), 
& k (Figure 3C), in terms of ITU 5.1/6.1 channels C (center), SC (surround center), SL (surround 
left), SR (surround right), L (left), and R (right), illustrated as follows with the source of sound to 
the right: 

Figure 3 A: "i" represents C and SC "inclined" upward while SL and SR incline 
downward. 

Figure 3B: "j" "juxtaposes" the C, SC, SL, and SR channels from "i". 

Figure 3C: "k" is lying on its back with has C and SC angling upward from the comer 
channels (L, R, SL, SR) which lie flat. 

Three tilted variants i* (Figure 3D), (Figure 3E), and k' (Figure 3F) rotate C, SC, SL, 
and SR with respect to L, R by any practical angle, e.g. -30^, in order to raise the microphone 
(suspended or on a high stand). The output of the baffled ambiophone varies only slightly with 
height incidence, so physical tilting is inconsequential for the FL, FR or BL, BR channels. 

From experience, recording engineers might identify applications described below for 
each of the six modes (keeping in mind they can be changed in post or replay): 
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Figure 3A ("i"): the microphone array is placed at source level (L, R), below acoustic 
shell reflections (C), e.g. an outdoor amphitheater event, with audience. 

Figure 3B ("i'"): the array is on a high stand or hanging in an opera house or symphony 
hall, the orchestra widely spaced in a pit or strings downstage (L, R), singers or winds upstage 
(C), hall ambience back (SL, SR) & up (SC). 

Figure 3C ("j"): the array is more closely placed before a small ensemble at source level 
for direct sound and early floor and sidewall reflections (L, R), higher direct solo and ceiling 
reflections (C), and hall ambience from back-up (SL, SR) and back-down (SC). 

Figure 3D ("j"'): the array hangs closer to a proscenium to pickup downstage sounds (L, 
R), upstage drama (C), highback ambience (SL, SR), and audience (SC). 

Figure 3E ("k"): the microphone array is in an arena with sports play-action or musical 
instruments at microphone level (L, R), and with good high-front (C) and back (SC) crowd 
sounds or ceiling ambience. 

Figure 3F ("k"'): the array is suspended in a cathedral with upstage choir (C) and front- 
of-church organ divisions and floor reflections (L, R), antiphonal and congregation in back (SL, 
SR), and organ trumpet overhead (SC). 
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After recording six PerAmbio 3D channels, given as {Pin} in 6x 1 matrix form, a 
"transformation" matrix {S}: 
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is applied to obtain the six ITU-compatible media channels {Sout} as follows: 

{Sout} = {S} • {Pin} 



where: {S} is defined above, {Sout} is 
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For a standard ITU home theater surround system, a multi-channel disc (6 discrete 
channel DVD- A, SACD, or DTS-CDIDVD-V) plays {Sout} directly in 5.1/6.1. If the speaker 
layout is 5.1, current implementations sxmi SC information into SL and SR speaker feeds at 
-3dB. 

When the user augments his system for 3D, a "reconstitution" matrix {P} is applied, 
which may be implemented in DSP, in response to flags in meta data that select one of six 
recording modes to recover losslessly PerAmbio 3D - in matrix form {Pout} - as follows: 
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{Pout} = {P} • {Sout} 

Since matrix {P} is the inverse of matrix {S}, 
{Pout} = {S}'^ • {Sout} 
PerAmbio 3D reconstitution is lossless if 
{Pout} = {Pin}. 

Experiments have led to improved matrices for the six transformation modes depicted in 
Figures 3 A - 3F. These matrices are shown in Figures 4A - 4F, respectively. 

Decoder 

In order to play back the encoded channels in 3D, the encoded signals must be decoded. 
For example, if a user chooses to install 3D speakers, power amplifiers, etc., in order to 
reproduce the 3D sound field, a "reconstitution" decoder must also be added as shown in Figure 
1 A. The decoder applies the inverse of the transformation matrix, or "reconstitution matrix" 
chosen for the recording. The reconstitution matrices for the transformation matrices in Figures 
4A - 4F are shown in Figures 5 A - 5F, respectively. 

Speaker Arrangements 

Figure 6 depicts a recommended loudspeaker position for a preferred embodiment of the 
inventive system using 12 speakers. Another preferred embodiment uses ten speakers 
comprising all the speakers in Figure 6 with the exception of the BL and BR speakers. In the 
loudspeaker positions of the depicted embodiment, the present inventive system is compatible 
playing existing two dimensional recordings made in ITU 5.1 or 6.1 format by moving backward 
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26% of the speaker diameter, the relative positions of 2 dimensional speakers to the listener are 
in full compHance with standard ITU-R775. Best results also require changing levels and delays 
of the four to six speakers affected, which could be a programmable function of DSP in the 
receiver/audio controller. Thus, the present invention offers full forward as well as backward 
compatibility between two dimensional and three dimensional recordings for all home theater 
users both before they expand their systems to three dimensions and thereafter. 

In a preferred 10-speaker arrangement, the speakers are arranged as follows: 

The FL, FR speakers are positioned so that: 

azimuthally, one is approximately 8 degrees to the left of and the other is 
approximately 8 degrees to the right of the 12 o'clock position (i.e., 
directly in front) of a listener; and 

elevationally, both are positioned substantially on a horizontal plane that 
intersects the listener's ears. 

The L, R speakers are positioned so that: 

azimuthally, one is approximately 45 degrees to the lefl of and the other is 

approximately 45 degrees to the right of the 12 o'clock position of the 
listener; and 

elevationally, both are positioned substantially on said horizontal plane. 
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The SL, SR speakers are positioned so that: 

azimuthally, one is approximately 135 degrees to the left of and the other is 

approximately 135 degrees to the right of the 12 o'clock position of the 
listener; and 

elevationally, both are positioned substantially on said horizontal plane. 

The UL, UR speakers are positioned so that: 

azimuthally, one is approximately 90 degrees to the left of and the other is 

approximately 90 degrees to the right of the 12 o'clock position of the 
listener; and 

elevationally, both are positioned above said horizontal plane. 

The DL, DR speakers are positioned so that: 

azimuthally, one is approximately 90 degrees to the left of and the other is 

approximately 90 degrees to the right of the 12 o'clock position of the 
listener; and 

elevationally, both are positioned below said horizontal plane. 

In a preferred 12-speaker arrangement, the two speakers are added to the above 
arrangement as follows: 
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The BL, BR speakers are positioned so that: 

azimuthally, one is approximately 172 degrees to the left of and the other is 
approximately 172 degrees to the right of the 12 o'clock position of a 
listener; and 

elevationally, both are positioned substantially on a horizontal plane that 
intersects the listener's ears. 

Although the various aspects of the present invention have been described with respect to 
heir preferred embodiments, it will be understood that the present invention is entitled to 
protection within the fixU scope of the appended claims. 
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